# Real-time audio processing
Pyannote Segmentation
MIT
This is a speaker segmentation model based on powerset encoding, capable of processing 10-second audio clips and identifying multiple speakers and their overlapping situations.
Audio Processing
P
it-just-works
771
0
Speaker Diarization 2.5
MIT
A speaker diarization model modified based on pyannote/speaker-diarization-3.0, using speechbrain/spkrec-ecapa-voxceleb for speaker embedding, with better performance in certain tests
Speaker Analysis
S
Willy030125
26
0
Whisper Large V3 Turbo Russian
MIT
Russian automatic speech recognition (ASR) model optimized based on OpenAI Whisper Large V3 Turbo, fine-tuned using the Mozilla Common Voice 17 Russian dataset
Speech Recognition
Transformers Other

W
dvislobokov
1,022
12
Voice Gender Classifier
MIT
A pre-trained model based on the ECAPA-TDNN architecture for classifying gender from human speech
Audio Classification
Transformers

V
JaesungHuh
14.01k
16
Pyannote Segmentation 30
MIT
This is an audio processing model for speaker diarization, capable of detecting speech activity, overlapping speech, and multiple speakers.
Audio Processing
P
collinbarnwell
873
0
Faster Whisper Large V3
MIT
Whisper large-v3 is a large-scale multilingual automatic speech recognition (ASR) model developed by OpenAI, supporting speech-to-text tasks in multiple languages.
Speech Recognition Supports Multiple Languages
F
Systran
713.48k
376
Speaker Diarization 3.1
MIT
An audio processing model for speaker segmentation that can automatically detect and segment different speakers in audio.
Speaker Analysis
S
pyannote
11.7M
822
Segmentation 3.0
MIT
This is a powerset-encoded speaker diarization model capable of processing 10-second audio clips to identify multiple speakers and their overlapping speech.
Speaker Analysis
S
pyannote
12.6M
445
Faster Whisper Large V2
MIT
This is the CTranslate2 converted version of OpenAI Whisper large-v2 model for efficient speech recognition
Speech Recognition Supports Multiple Languages
F
guillaumekln
161.19k
199
Pyannote Speaker Diarization Endpoint
MIT
Speaker diarization model based on pyannote.audio 2.0 for automatic detection of speaker changes and speech activity in audio
Speaker Analysis
P
philschmid
51
18
Wav2vec2 Large Xlsr 53 Spanish
Apache-2.0
A large-scale cross-lingual speech recognition model based on the Wav2Vec2 architecture, specifically optimized for Spanish, released by Facebook
Speech Recognition Spanish
W
facebook
66.63k
20
Fasnettac Paper
An audio separation model trained based on the Asteroid framework, specifically designed for multi-channel audio signal separation tasks with noise
Sound Separation
F
popcornell
21
3
Convtasnet Libri1Mix Enhsingle
ConvTasNet model trained on the Asteroid framework for single-channel speech enhancement tasks
Audio Enhancement
C
mhu-coder
18
1
Quran Speech Recognizer
This model is a transfer learning-based Arabic speech recognition system specifically designed to identify Quran recitations and locate corresponding chapters.
Speech Recognition
Transformers

Q
Nuwaisir
123
9
Featured Recommended AI Models